From policies to influences: a framework for nonlocal abstraction in transition-dependent Dec-POMDP agents
نویسندگان
چکیده
Decentralized Partially-Observable Markov Decision Processes (Dec-POMDPs) are powerful theoretical models for deriving optimal coordination policies of agent teams in environments with uncertainty. Unfortunately, their general NEXP solution complexity [3] presents significant challenges when applying them to real-world problems, particularly those involving teams of more than two agents. Inevitably, the policy space becomes intractably large as agents coordinate joint decisions that are based on dissimilar beliefs about an uncertain world state and that involve performing actions with stochastic effects. Our work directly confronts the policy space explosion with the intuition that instead of coordinating all policy decisions, agents need only coordinate abstractions of their policies that constitute the essential influences that they exert on each other. As a running example, consider the problem shown in Figure 1, involving two interacting rover agents (among a team of several others) that are exploring the surface of Mars. As shown, the agents perform various tasks (constrained to take place within a window of execution) with nondeterministic duration (D) and quality (Q) outcomes, and in performing their tasks may alter the outcomes of other agents’ tasks. Here, agent 1 may choose to visit and prepare research site C, which will (in expectation) make agent 2’s analysis of site C quicker and more valuable. This problem can be expressed Cite as: From Policies to Influences: A Framework for Nonlocal Abstraction in Transition-Dependent Dec-POMDP Agents (Extended Abstract), S. Witwicki and E. Durfee, Proc. of 9th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2010), van der Hoek, Kaminka, Lespérance, Luck and Sen (eds.), May, 10–14, 2010, Toronto, Canada, pp. Copyright c © 2010, International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved. Analyze C outcome:
منابع مشابه
Influence-Based Policy Abstraction for Weakly-Coupled Dec-POMDPs
Decentralized POMDPs are powerful theoretical models for coordinating agents’ decisions in uncertain environments, but the generally-intractable complexity of optimal joint policy construction presents a significant obstacle in applying Dec-POMDPs to problems where many agents face many policy choices. Here, we argue that when most agent choices are independent of other agents’ choices, much of...
متن کاملA POMDP Framework to Find Optimal Inspection and Maintenance Policies via Availability and Profit Maximization for Manufacturing Systems
Maintenance can be the factor of either increasing or decreasing system's availability, so it is valuable work to evaluate a maintenance policy from cost and availability point of view, simultaneously and according to decision maker's priorities. This study proposes a Partially Observable Markov Decision Process (POMDP) framework for a partially observable and stochastically deteriorating syste...
متن کاملDecentralized POMDPs
This chapter presents an overview of the decentralized POMDP (Dec-POMDP) framework. In a Dec-POMDP, a team of agents collaborates to maximize a global reward based on local information only. This means that agents do not observe a Markovian signal during execution and therefore the agents’ individual policies map from histories to actions. Searching for an optimal joint policy is an extremely h...
متن کاملInformed Initial Policies for Learning in Dec-POMDPs
Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multi-agent systems where agents operate with noisy sensors and actuators and local information. While many techniques have been developed for solving DecPOMDPs exactly and approximately, they have been primarily centralized and reliant on knowledge of the model parameters....
متن کاملPeriodic Finite State Controllers for Efficient POMDP and DEC-POMDP Planning
Applications such as robot control and wireless communication require planning under uncertainty. Partially observable Markov decision processes (POMDPs) plan policies for single agents under uncertainty and their decentralized versions (DEC-POMDPs) find a policy for multiple agents. The policy in infinite-horizon POMDP and DEC-POMDP problems has been represented as finite state controllers (FS...
متن کامل